W3: Data Visualization

Data Visualization

Reminder: Penguins

gt::gt(head(penguins))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
  • bill_depth_mm : numeric
  • bill_length_mm : numeric
  • species : character

Common Plots

One Variable

  • Numeric: histogram
  • Character: bar plots

Two Variables

  • Numeric vs. Numeric: Scatterplot, line plot
  • Numeric vs. Character: Box plot

Why focus on these plots?

We build a plot one part at a time

Data +

Mapping to data +

Geometry

Think about making plots like using recipes from a cookbook: https://r-graphics.org/

Building a Histogram

ggplot(penguins) +

aes(x = bill_length_mm) +

geom_histogram()

Data +

Mapping to data +

Geometry

Taking it one part at a time (Data)

ggplot(penguins)

Taking it one part at a time (Aesthetics)

ggplot(penguins) +
  aes(x = bill_length_mm)

Taking it one part at a time (Geometry)

ggplot(penguins) +
  aes(x = bill_length_mm) +
  geom_histogram()

Histogram

ggplot(penguins) + aes(x = bill_length_mm) + geom_histogram()

Scatterplot

ggplot(penguins) +

aes(x = bill_length_mm, y = bill_depth_mm) +

geom_point()

Scatterplot (data)

ggplot(penguins)

Scatterplot (aesthetics)

ggplot(penguins) +
  aes(x = bill_length_mm, y=bill_depth_mm) 

Scatterplot (geometry)

ggplot(penguins) +
  aes(x = bill_length_mm, y=bill_depth_mm) +
  geom_point()

What about more than two variables?

Three Variables

  • x=bill_length_mm
  • y=bill_depth_mm
  • color=species

ggplot(penguins) + aes(x = bill_length_mm, y = bill_depth_mm, color = species) + geom_point()

Multivaraite Scatterplot by facet

ggplot(penguins) + aes(x = bill_length_mm, y = bill_depth_mm) + geom_point() + facet_wrap(~species)

Bar plots

Made for categorical data. Bar plots automatically count each group for you, so you only need to provide one variable (axis).

ggplot(penguins) + aes(x = species) + geom_bar()

Bar plots, providing both axis

Alternatively, if you want to provide both axis for plotting:

penguins_grouped = group_by(penguins, species)
penguins_summary = summarise(penguins_grouped, n_species = n())
penguins_summary
# A tibble: 3 × 2
  species   n_species
  <fct>         <int>
1 Adelie          152
2 Chinstrap        68
3 Gentoo          124
ggplot(penguins_summary) + aes(x = species, y = n_species) + geom_bar(stat = "identity")

Histogram with a plot theme

ggplot(penguins) + aes(x = bill_length_mm) + geom_histogram() + theme_bw()

Histogram with options

ggplot(penguins) + aes(x = bill_length_mm) + geom_histogram(binwidth = 5)

Boxplot

ggplot(penguins) + aes(x = species, y = bill_depth_mm) + geom_boxplot()

Grouped Boxplot

ggplot(penguins) + aes(x = species, y = bill_depth_mm, color = island) + geom_boxplot()

Some additional options

ggplot(data = penguins) + aes(x = bill_length_mm, y = bill_depth_mm, color = species) + geom_point() + labs(x = “Bill Length”, y = “Bill Depth”, title = “Comparison of penguin bill length and bill depth across species”) + scale_x_continuous(limits = c(30, 60))

Summary of options

data


geom_point: x, y, color, shape

geom_line: x, y, group, color

geom_histogram: x, y, fill

geom_bar: x, fill

geom_boxplot: x, y, fill, color


facet_wrap


labs

scale_x_continuous

scale_y_continuous

scale_x_discrete

scale_y_discrete

esquisse as a helper

Consider the esquisse package to help generate your ggplot code via drag and drop.

library(esquisse)

esquisser(penguins)

R Graphics Cookbook

An excellent resource: https://r-graphics.org/